Search CORE

95 research outputs found

Promoter prediction in E. coli based on SIDD profiles and Artificial Neural Networks

Author: A Kanhere
Abigail S Newsome
Aleksandra A Markovets
AM Huerta
Charles Bland
GZ Hertz
H Wang
H Wang
L Kozobay-Avraham
V Rangannan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background One of the major challenges in biology is the correct identification of promoter regions. Computational methods based on motif searching have been the traditional approach taken. Recent studies have shown that DNA structural properties, such as curvature, stacking energy, and stress-induced duplex destabilization (SIDD) are useful in promoter prediction, as well. In this paper, the currently used SIDD energy threshold method is compared to the proposed artificial neural network (ANN) approach for finding promoters based on SIDD profile data. Results When compared to the SIDD threshold prediction method, artificial neural networks showed noticeable improvements for precision, recall, and <it>F</it>-score over a range of values. The maximal <it>F</it>-score for the ANN classifier was 62.3 and 56.8 for the threshold-based classifier. Conclusions Artificial neural networks were used to predict promoters based on SIDD profile data. Results using this technique were an improvement over the previous SIDD threshold approach. Over a wide range of precision-recall values, artificial neural networks were more capable of identifying distinctive characteristics of promoter regions than threshold based methods.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis of computational approaches for motif discovery

Author: CT Workman
DK Neal
G Pavesi
GEP Box
GZ Hertz
JD Hughes
M Burset
M Tompa
Martin Tompa
Nan Li
P McCullagh
S Sinha
TL Bailey
V Matys
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

Recently, we performed an assessment of 13 popular computational tools for discovery of transcription factor binding sites (M. Tompa, N. Li, et al., "Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites", Nature Biotechnology, Jan. 2005). This paper contains follow-up analysis of the assessment results, and raises and discusses some important issues concerning the state of the art in motif discovery methods: 1. We categorize the objective functions used by existing tools, and design experiments to evaluate whether any of these objective functions is the right one to optimize. 2. We examine various features of the data sets that were used in the assessment, such as sequence length and motif degeneracy, and identify which features make data sets hard for current motif discovery tools. 3. We identify an important feature that has not yet been used by existing tools and propose a new objective function that incorporates this feature

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

RecMotif: a novel fast algorithm for weak motif discovery

Author: A Price
CE Lawrence
E Fratkin
E Wijaya
FP Roth
G Pavesi
G Wang
GD Stormo
GZ Hertz
GZ Hertz
He Quan Sun
HJ Bussemaker
J Buhler
J Davila
J Davila
Jagath C Rajapakse
L Ming
Malcolm Yoke Hean Low
MF Sagot
PA Pevzner
S Liang
S Rajasekaran
S Sinha
SH Sze
TL Bailey
U Keich
Wen Jing Hsu
X Yang
Z Yao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Efficient and accurate P-value computation for Position Weight Matrices

Author: A Liefooghe
C Pizzi
E Wingender
G Bejerano
GE Crooks
GZ Hertz
H Huang
Hélène Touzet
J Zhang
Jean-Stéphane Varré
JM Claverie
K Malde
M Beckstette
M Garey
R Staden
S Mount
S Rahmann
TD Wu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Position Weight Matrices (PWMs) are probabilistic representations of signals in sequences. They are widely used to model approximate patterns in DNA or in protein sequences. The usage of PWMs needs as a prerequisite to knowing the statistical significance of a word according to its score. This is done by defining the P-value of a score, which is the probability that the background model can achieve a score larger than or equal to the observed value. This gives rise to the following problem: Given a P-value, find the corresponding score threshold. Existing methods rely on dynamic programming or probability generating functions. For many examples of PWMs, they fail to give accurate results in a reasonable amount of time. Results The contribution of this paper is two fold. First, we study the theoretical complexity of the problem, and we prove that it is NP-hard. Then, we describe a novel algorithm that solves the P-value problem efficiently. The main idea is to use a series of discretized score distributions that improves the final result step by step until some convergence criterion is met. Moreover, the algorithm is capable of calculating the exact P-value without any error, even for matrices with non-integer coefficient values. The same approach is also used to devise an accurate algorithm for the reverse problem: finding the P-value for a given score. Both methods are implemented in a software called TFM-PVALUE, that is freely available. Conclusion We have tested TFM-PVALUE on a large set of PWMs representing transcription factor binding sites. Experimental results show that it achieves better performance in terms of computational time and precision than existing tools.</p

HAL - Lille 3

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

A Careful Look at Binding Site Reorganization in the even-skipped Enhancers of Drosophila and Sepsids

Author: A Erives
Brant K. Peterson
D Stanojevic
DN Arnosti
EE Hare
EL Sonnhammer
Emily E. Hare
GZ Hertz
J Crocker
M Markstein
Michael B. Eisen
Norbert Perrimon
NS Wratten
S Fisher
S Gray
S Gray
S Small
S Small
Publication venue: Public Library of Science
Publication date: 01/11/2008
Field of study

Organismic and Evolutionary Biolog

Crossref

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

Ultraconserved Elements in the Olig2 Promoter

Author: A Meissner
A Sandelin
A Woolfe
AG Rosmarin
B Hu
Barak A. Cohen
BG Novitch
C Hildebrand
Christina T. L. Chen
CT Chen
D Karolchik
David I. Gottlieb
DC Bauer
G Bain
G Bain
G Bejerano
GZ Hertz
GZ Hertz
H Akaike
H Li
H Takebayashi
H Takebayashi
H Takebayashi
HC Park
HQ Xian
HQ Xian
IJ Donaldson
JA Girault
JW Thomas
K Huang
K Murakami
K Takahashi
KL Ligon
L Georgieva
M Sugimori
M Yao
N Billon
O Brustle
OJ Bronchain
Olaf Sporns
QR Lu
QR Lu
QR Lu
S Kyo
S Liu
T Kuhlmann
T Sun
T Vavouri
V Matys
X Zhang
Y Nakatake
Y Uchida
ZW Du
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

. basal promoter and found that it represses expression in undifferentiated embryonic stem cells. expression during development

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences

Author: A Brazma
A Califano
B Brejova
DR Cavener
E Eskin
Fathi Elloumi
FP Roth
G Pavesi
G Thijs
GZ Hertz
H Salgado
I Jonassen
I Rigoutsos
I Rigoutsos
J Van Helden
M Burset
M Tompa
Martha Nason
PA Pevzner
PA Pevzner
R Agrawal
S Sinha
S Sinha
TL Bailey
Y Makita
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Computational methods to predict transcription factor binding sites (TFBS) based on exhaustive algorithms are guaranteed to find the best patterns but are often limited to short ones or impose some constraints on the pattern type. Many patterns for binding sites in prokaryotic species are not well characterized but are known to be large, between 16–30 base pairs (bp) and contain at least 2 conserved bases. The length of prokaryotic species promoters (about 400 bp) and our interest in studying a small set of genes that could be a cluster of co-regulated genes from microarray experiments led to the development of a new exhaustive algorithm targeting these large patterns. Results We present Searchpattool, a new method to search for and select the most specific (conservative) frequent patterns. This method does not impose restrictions on the density or the structure of the pattern. The best patterns (motifs) are selected using several statistics, including a new application of a z-score based on the number of matching sequences. We compared Searchpattool against other well known algorithms on a <it>Bacillus subtilis </it>group of 14 input sequences and found that in our experiments Searchpattool always performed the best based on performance scores. Conclusion Searchpattool is a new method for pattern discovery relative to transcription factor binding sites for species or genes with short promoters. It outputs the most specific significant patterns and helps the biologist to choose the best candidates.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Analysis of Gene Regulatory Networks in the Mammalian Circadian Rhythm

Author: AE Kel
AI Su
B Kornmann
BH Miller
BR Zeeberg
Chunxuan Shao
FO James
G Thijs
GZ Hertz
H Wakaguri
Haifang Wang
HR Ueda
HR Ueda
Jeffrey M. Gimble
Jun Yan
K Bozek
KD Pruitt
L Yin
M Rakhshandehroo
P Carninci
S Aerts
S Panda
S Rahmann
SM Reppert
V Porterfield
Y Suzuki
Yuting Liu
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Circadian rhythm is fundamental in regulating a wide range of cellular, metabolic, physiological, and behavioral activities in mammals. Although a small number of key circadian genes have been identified through extensive molecular and genetic studies in the past, the existence of other key circadian genes and how they drive the genomewide circadian oscillation of gene expression in different tissues still remains unknown. Here we try to address these questions by integrating all available circadian microarray data in mammals. We identified 41 common circadian genes that showed circadian oscillation in a wide range of mouse tissues with a remarkable consistency of circadian phases across tissues. Comparisons across mouse, rat, rhesus macaque, and human showed that the circadian phases of known key circadian genes were delayed for 4–5 hours in rat compared to mouse and 8–12 hours in macaque and human compared to mouse. A systematic gene regulatory network for the mouse circadian rhythm was constructed after incorporating promoter analysis and transcription factor knockout or mutant microarray data. We observed the significant association of cis-regulatory elements: EBOX, DBOX, RRE, and HSE with the different phases of circadian oscillating genes. The analysis of the network structure revealed the paths through which light, food, and heat can entrain the circadian clock and identified that NR3C1 and FKBP/HSP90 complexes are central to the control of circadian genes through diverse environmental signals. Our study improves our understanding of the structure, design principle, and evolution of gene regulatory networks involved in the mammalian circadian rhythm

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site

Author: A Ambesi-Impiombato
A Blais
A Eto
A Subramanian
AE Kel
AG Clark
AL Lam
AM McGuire
Anat Reiner
Assif Yitzhaky
B Ren
C Kimura-Yoshida
C Plessy
C Yang
CT Harbison
D Pfeifer
D Wang
DB Allison
E Emberly
E Segal
Eytan Domany
FP Roth
GC Pipes
GC Yuan
GQ Yao
GZ Hertz
H Li
H Lodish
J Zheng
JD Hughes
JL DeRisi
JQ Ling
K Frech
K Quandt
KD MacIsaac
L Amir-Zilberstein
L Elnitski
L Marino-Ramirez
L McCue
M Ashburner
M Kellis
M Milyavsky
MA Nobrega
Mark Koudritsky
MC Frith
ML Howard
ML Whitfield
N Rajewsky
Or Zuk
P Carninci
P Carninci
P Cliften
PM Haverty
PR Buckland
R Elkon
R Liu
R Sharan
Ran Brosh
S Aerts
S Rashi-Elkeles
S Tavazoie
SJ Cooper
SJ Ho Sui
Sui Huang
U Gerland
Varda Rotter
WW Wasserman
X Xie
Y Barash
Y Benjamini
Y Benjamini
Y Tabach
Yossi Buganim
Yuval Tabach
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2007
Field of study

We introduce a novel method to screen the promoters of a set of genes with shared biological function, against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. The gene sets were obtained from the functional Gene Ontology (GO) classification; for each set and motif we optimized the sequence similarity score threshold, independently for every location window (measured with respect to the TSS), taking into account the location dependent nucleotide heterogeneity along the promoters of the target genes. We performed a high throughput analysis, searching the promoters (from 200bp downstream to 1000bp upstream the TSS), of more than 8000 human and 23,000 mouse genes, for 134 functional Gene Ontology classes and for 412 known DNA motifs. When combined with binding site and location conservation between human and mouse, the method identifies with high probability functional binding sites that regulate groups of biologically related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were put to several experimental tests. By allowing a "flexible" threshold and combining our functional class and location specific search method with conservation between human and mouse, we are able to identify reliably functional TF binding sites. This is an essential step towards constructing regulatory networks and elucidating the design principles that govern transcriptional regulation of expression. The promoter region proximal to the TSS appears to be of central importance for regulation of transcription in human and mouse, just as it is in bacteria and yeast.Comment: 31 pages, including Supplementary Information and figure

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The value of position-specific priors in motif discovery using MEME

Author: BC Foat
CT Harbison
DC Bauer
E Redhead
F Fang
FA Buske
GD Stormo
GZ Hertz
KD MacIsaac
L Narlikar
L Narlikar
MC Frith
Mikael Bodén
Philip Machanick
R Gordân
R Siddharthan
RC McLeay
S Sinha
Timothy L Bailey
TL Bailey
TL Bailey
TL Bailey
Tom Whitington
V Matys
WH Kruskal
WJ Kent
X Chen
Y Barash
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Position-specific priors have been shown to be a flexible and elegant way to extend the power of Gibbs sampler-based motif discovery algorithms. Information of many types–including sequence conservation, nucleosome positioning, and negative examples–can be converted into a prior over the location of motif sites, which then guides the sequence motif discovery algorithm. This approach has been shown to confer many of the benefits of conservation-based and discriminative motif discovery approaches on Gibbs sampler-based motif discovery methods, but has not previously been studied with methods based on expectation maximization (EM). Results We extend the popular EM-based MEME algorithm to utilize position-specific priors and demonstrate their effectiveness for discovering transcription factor (TF) motifs in yeast and mouse DNA sequences. Utilizing a discriminative, conservation-based prior dramatically improves MEME's ability to discover motifs in 156 yeast TF ChIP-chip datasets, more than doubling the number of datasets where it finds the correct motif. On these datasets, MEME using the prior has a higher success rate than eight other conservation-based motif discovery approaches. We also show that the same type of prior improves the accuracy of motifs discovered by MEME in mouse TF ChIP-seq data, and that the motifs tend to be of slightly higher quality those found by a Gibbs sampling algorithm using the same prior. Conclusions We conclude that using position-specific priors can substantially increase the power of EM-based motif discovery algorithms such as MEME algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central